AITopics | Banska Bystrica

Collaborating Authors

Banska Bystrica

skLEP: A Slovak General Language Understanding Benchmark

Šuppa, Marek, Ridzik, Andrej, Hládek, Daniel, Javůrek, Tomáš, Ondrejová, Viktória, Sásiková, Kristína, Tamajka, Martin, Šimko, Marián

arXiv.org Artificial IntelligenceJun-27-2025

In this work, we introduce skLEP, the first comprehensive benchmark specifically designed for evaluating Slovak natural language understanding (NLU) models. We have compiled skLEP to encompass nine diverse tasks that span token-level, sentence-pair, and document-level challenges, thereby offering a thorough assessment of model capabilities. To create this benchmark, we curated new, original datasets tailored for Slovak and meticulously translated established English NLU resources. Within this paper, we also present the first systematic and extensive evaluation of a wide array of Slovak-specific, multilingual, and English pre-trained language models using the skLEP tasks. Finally, we also release the complete benchmark data, an open-source toolkit facilitating both fine-tuning and evaluation of models, and a public leaderboard at https://github.com/slovak-nlp/sklep in the hopes of fostering reproducibility and drive future research in Slovak NLU.

benchmark, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.21508

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Asia > Singapore (0.04)
(19 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Government > Regional Government (0.46)
Media > News (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Double machine learning for sample selection models

Bia, Michela, Huber, Martin, Lafférs, Lukáš

arXiv.org Machine LearningDec-9-2020

This paper considers treatment evaluation when outcomes are only observed for a subpopulation due to sample selection or outcome attrition/non-response. For identification, we combine a selection-on-observables assumption for treatment assignment with either selection-on-observables or instrumental variable assumptions concerning the outcome attrition/sample selection process. To control in a data-driven way for potentially high dimensional pre-treatment covariates that motivate the selectionon-observables assumptions, we adapt the double machine learning framework to sample selection problems. That is, we make use of (a) Neyman-orthogonal and doubly robust score functions, which imply the robustness of treatment effect estimation to moderate regularization biases in the machine learningbased estimation of the outcome, treatment, or sample selection models and (b) sample splitting (or cross-fitting) to prevent overfitting bias. We demonstrate that the proposed estimators are asymptotically normal and root-n consistent under specific regularity conditions concerning the machine learners and investigate their finite sample properties in a simulation study. The estimator is available in the causalweight package for the statistical software R. Keywords: sample selection, double machine learning, doubly robust estimation, efficient score.

assumption, assumption 3, selection, (16 more...)

arXiv.org Machine Learning

2012.00745

Country:

Europe > Switzerland > Fribourg > Fribourg (0.04)
South America > Colombia (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Education (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback